-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create better primary keys for subtrees #2180
Conversation
Attached issue: https://pulp.plan.io/issues/9566 |
@@ -0,0 +1,9 @@ | |||
Scenario: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! The changelog entries are usually short descriptions, could you replace all of this something more like:
"Fixed a bug where sub-repos (distribution tree repos) could conflict with each other in common workflows."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HolgerHees , great job at finding not very straightforward issues and solution for them! Thanks a lot
@@ -478,7 +478,7 @@ def is_subrepo(directory): | |||
if repodata == DIST_TREE_MAIN_REPO_PATH: | |||
treeinfo["repositories"].update({repodata: None}) | |||
continue | |||
name = f"{repodata}-{treeinfo['hash']}" | |||
name = f"{repodata}-{treeinfo['hash']}-{repository_pk}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@goosemania Question: We probably need to clean up the repos using the old naming scheme somehow, and I can think of two different ways to do that. Either we could look for them under the old name in the sync code, which is the only way to backport this patch properly, or we could do a migration, or both.
I'm thinking we probably need to do both? Do we have enough information available to perform a rename in a migration in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need to backport it, I wonder if it's possible at all to get to this situation using Katello, Pulp upstream folks should be able to upgrade.
I'm for migrations and cleaner code (not to introduce new bugs because of handling 2 different naming schemes). There is one potential problem but the probability is extremely low - if someone named some other repo manually like suggested in the patch, we'll run into conflict during a migration. (The probability is getting higher if someone applied this patch as is and later decided to upgrade).
As for info, I think we have enough, which part are you concerned about? We need to look for repos which are user_hidden=True
and which names do not end with repository_pk
and add -<repository_pk>
to those. Do I miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one potential problem but the probability is extremely low - if someone named some other repo manually like suggested in the patch, we'll run into conflict during a migration. (The probability is getting higher if someone applied this patch as is and later decided to upgrade).
We should probably consider adding user_hidden
to the uniqueness constraint.
Do I miss anything?
No, I suppose the question is just whether we know what the base repository PK is for any arbitrary sub-repo. Or is it the sub-repo's repository PK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I had the exactly the same thoughts, in that order :). I thought that it's base repo PK (figured out how to get it, but it's painful - you go through every dist tree and check which repos its addons and variants refer to.) but than looking at the code it seems to be a subrepo PK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm ok with merging this as-is, as long as we find out before the release whether this might potentially need to be backported. I don't want to end up in a situation where we do need to backport and then need to go back and change the released migration to accomodate. Unless we can write the migration in an agnostic way to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to merge as is now but then have some code changes before the release (depending on the need for backports, it would be a migration approach or support for 2 schemes)? If that's the case, maybe we need an issue which will block the upcoming release, not to forget about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main consideration is simply to not make the author write a migration for us too :) I agree about filing an issue though.
Or I can just close this and re-open it to write the migration myself. I guess I'll do that.
@HolgerHees Thanks for this! I'm not rejecting it, I'm just going to make some changes and merge a new PR instead of this one. |
Scenario:
My primary pulp instance is hosting 2 distributions of the same repository (like staging and production) which are referencing different versions of the same repository. During my initial run, both distributions are point to version 1. So far so good.
Now I have a secondary pulp instance which is mirroring the 2 primary distributions by creating separate remotes and repositories.
The repository on the primary node contains now a subtree which is identically in version 1 for staging and production. Means it has the same hash.
Now, during the sync process the metadata for this subtree are stored by createing a primary key like "{repodata}-{treeinfo['hash']}". This collides with staging and production, because contentwise and with the hash, the subtree is the same for both staging and production. The key should be something like "{repodata}-{treeinfo['hash']}-{repository_pk}"
closes: #9566
https://pulp.plan.io/issues/9566